{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Ex3 - Getting and Knowing your Data\n",
    "\n",
    "Check out [Occupation Exercises Video Tutorial](https://www.youtube.com/watch?v=W8AB5s-L3Rw&list=PLgJhDSE2ZLxaY_DigHeiIDC1cD09rXgJv&index=4) to watch a data scientist go through the exercises"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This time we are going to pull data directly from the internet.\n",
    "Special thanks to: https://github.com/justmarkham for sharing the dataset and materials.\n",
    "\n",
    "### Step 1. Import the necessary libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 2. Import the dataset from this [address](https://raw.githubusercontent.com/justmarkham/DAT8/master/data/u.user). "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 3. Assign it to a variable called users and use the 'user_id' as index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [],
   "source": [
    "users = pd.read_csv('https://raw.githubusercontent.com/justmarkham/DAT8/master/data/u.user', \n",
    "                      sep='|', index_col='user_id')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 4. See the first 25 entries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>gender</th>\n",
       "      <th>occupation</th>\n",
       "      <th>zip_code</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>user_id</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>24</td>\n",
       "      <td>M</td>\n",
       "      <td>technician</td>\n",
       "      <td>85711</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>53</td>\n",
       "      <td>F</td>\n",
       "      <td>other</td>\n",
       "      <td>94043</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>23</td>\n",
       "      <td>M</td>\n",
       "      <td>writer</td>\n",
       "      <td>32067</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>24</td>\n",
       "      <td>M</td>\n",
       "      <td>technician</td>\n",
       "      <td>43537</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>33</td>\n",
       "      <td>F</td>\n",
       "      <td>other</td>\n",
       "      <td>15213</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>42</td>\n",
       "      <td>M</td>\n",
       "      <td>executive</td>\n",
       "      <td>98101</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>57</td>\n",
       "      <td>M</td>\n",
       "      <td>administrator</td>\n",
       "      <td>91344</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>36</td>\n",
       "      <td>M</td>\n",
       "      <td>administrator</td>\n",
       "      <td>05201</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>29</td>\n",
       "      <td>M</td>\n",
       "      <td>student</td>\n",
       "      <td>01002</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>53</td>\n",
       "      <td>M</td>\n",
       "      <td>lawyer</td>\n",
       "      <td>90703</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>39</td>\n",
       "      <td>F</td>\n",
       "      <td>other</td>\n",
       "      <td>30329</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>28</td>\n",
       "      <td>F</td>\n",
       "      <td>other</td>\n",
       "      <td>06405</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>47</td>\n",
       "      <td>M</td>\n",
       "      <td>educator</td>\n",
       "      <td>29206</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>45</td>\n",
       "      <td>M</td>\n",
       "      <td>scientist</td>\n",
       "      <td>55106</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>49</td>\n",
       "      <td>F</td>\n",
       "      <td>educator</td>\n",
       "      <td>97301</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>21</td>\n",
       "      <td>M</td>\n",
       "      <td>entertainment</td>\n",
       "      <td>10309</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>30</td>\n",
       "      <td>M</td>\n",
       "      <td>programmer</td>\n",
       "      <td>06355</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>35</td>\n",
       "      <td>F</td>\n",
       "      <td>other</td>\n",
       "      <td>37212</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>40</td>\n",
       "      <td>M</td>\n",
       "      <td>librarian</td>\n",
       "      <td>02138</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>42</td>\n",
       "      <td>F</td>\n",
       "      <td>homemaker</td>\n",
       "      <td>95660</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>26</td>\n",
       "      <td>M</td>\n",
       "      <td>writer</td>\n",
       "      <td>30068</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>25</td>\n",
       "      <td>M</td>\n",
       "      <td>writer</td>\n",
       "      <td>40206</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>30</td>\n",
       "      <td>F</td>\n",
       "      <td>artist</td>\n",
       "      <td>48197</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>21</td>\n",
       "      <td>F</td>\n",
       "      <td>artist</td>\n",
       "      <td>94533</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>39</td>\n",
       "      <td>M</td>\n",
       "      <td>engineer</td>\n",
       "      <td>55107</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         age gender     occupation zip_code\n",
       "user_id                                    \n",
       "1         24      M     technician    85711\n",
       "2         53      F          other    94043\n",
       "3         23      M         writer    32067\n",
       "4         24      M     technician    43537\n",
       "5         33      F          other    15213\n",
       "6         42      M      executive    98101\n",
       "7         57      M  administrator    91344\n",
       "8         36      M  administrator    05201\n",
       "9         29      M        student    01002\n",
       "10        53      M         lawyer    90703\n",
       "11        39      F          other    30329\n",
       "12        28      F          other    06405\n",
       "13        47      M       educator    29206\n",
       "14        45      M      scientist    55106\n",
       "15        49      F       educator    97301\n",
       "16        21      M  entertainment    10309\n",
       "17        30      M     programmer    06355\n",
       "18        35      F          other    37212\n",
       "19        40      M      librarian    02138\n",
       "20        42      F      homemaker    95660\n",
       "21        26      M         writer    30068\n",
       "22        25      M         writer    40206\n",
       "23        30      F         artist    48197\n",
       "24        21      F         artist    94533\n",
       "25        39      M       engineer    55107"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "users.head(25)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 5. See the last 10 entries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>gender</th>\n",
       "      <th>occupation</th>\n",
       "      <th>zip_code</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>user_id</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>934</th>\n",
       "      <td>61</td>\n",
       "      <td>M</td>\n",
       "      <td>engineer</td>\n",
       "      <td>22902</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>935</th>\n",
       "      <td>42</td>\n",
       "      <td>M</td>\n",
       "      <td>doctor</td>\n",
       "      <td>66221</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>936</th>\n",
       "      <td>24</td>\n",
       "      <td>M</td>\n",
       "      <td>other</td>\n",
       "      <td>32789</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>937</th>\n",
       "      <td>48</td>\n",
       "      <td>M</td>\n",
       "      <td>educator</td>\n",
       "      <td>98072</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>938</th>\n",
       "      <td>38</td>\n",
       "      <td>F</td>\n",
       "      <td>technician</td>\n",
       "      <td>55038</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>939</th>\n",
       "      <td>26</td>\n",
       "      <td>F</td>\n",
       "      <td>student</td>\n",
       "      <td>33319</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>940</th>\n",
       "      <td>32</td>\n",
       "      <td>M</td>\n",
       "      <td>administrator</td>\n",
       "      <td>02215</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>941</th>\n",
       "      <td>20</td>\n",
       "      <td>M</td>\n",
       "      <td>student</td>\n",
       "      <td>97229</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>942</th>\n",
       "      <td>48</td>\n",
       "      <td>F</td>\n",
       "      <td>librarian</td>\n",
       "      <td>78209</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>943</th>\n",
       "      <td>22</td>\n",
       "      <td>M</td>\n",
       "      <td>student</td>\n",
       "      <td>77841</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         age gender     occupation zip_code\n",
       "user_id                                    \n",
       "934       61      M       engineer    22902\n",
       "935       42      M         doctor    66221\n",
       "936       24      M          other    32789\n",
       "937       48      M       educator    98072\n",
       "938       38      F     technician    55038\n",
       "939       26      F        student    33319\n",
       "940       32      M  administrator    02215\n",
       "941       20      M        student    97229\n",
       "942       48      F      librarian    78209\n",
       "943       22      M        student    77841"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "users.tail(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 6. What is the number of observations in the dataset?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "943"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "users.shape[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 7. What is the number of columns in the dataset?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "4"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "users.shape[1]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 8. Print the name of all the columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['age', 'gender', 'occupation', 'zip_code'], dtype='object')"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "users.columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 9. How is the dataset indexed?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Int64Index([  1,   2,   3,   4,   5,   6,   7,   8,   9,  10,\n",
       "            ...\n",
       "            934, 935, 936, 937, 938, 939, 940, 941, 942, 943],\n",
       "           dtype='int64', name='user_id', length=943)"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# \"the index\" (aka \"the labels\")\n",
    "users.index"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 10. What is the data type of each column?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "age            int64\n",
       "gender        object\n",
       "occupation    object\n",
       "zip_code      object\n",
       "dtype: object"
      ]
     },
     "execution_count": 47,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "users.dtypes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 11. Print only the occupation column"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "user_id\n",
       "1         technician\n",
       "2              other\n",
       "3             writer\n",
       "4         technician\n",
       "5              other\n",
       "6          executive\n",
       "7      administrator\n",
       "8      administrator\n",
       "9            student\n",
       "10            lawyer\n",
       "11             other\n",
       "12             other\n",
       "13          educator\n",
       "14         scientist\n",
       "15          educator\n",
       "16     entertainment\n",
       "17        programmer\n",
       "18             other\n",
       "19         librarian\n",
       "20         homemaker\n",
       "21            writer\n",
       "22            writer\n",
       "23            artist\n",
       "24            artist\n",
       "25          engineer\n",
       "26          engineer\n",
       "27         librarian\n",
       "28            writer\n",
       "29        programmer\n",
       "30           student\n",
       "           ...      \n",
       "914            other\n",
       "915    entertainment\n",
       "916         engineer\n",
       "917          student\n",
       "918        scientist\n",
       "919            other\n",
       "920           artist\n",
       "921          student\n",
       "922    administrator\n",
       "923          student\n",
       "924            other\n",
       "925         salesman\n",
       "926    entertainment\n",
       "927       programmer\n",
       "928          student\n",
       "929        scientist\n",
       "930        scientist\n",
       "931         educator\n",
       "932         educator\n",
       "933          student\n",
       "934         engineer\n",
       "935           doctor\n",
       "936            other\n",
       "937         educator\n",
       "938       technician\n",
       "939          student\n",
       "940    administrator\n",
       "941          student\n",
       "942        librarian\n",
       "943          student\n",
       "Name: occupation, Length: 943, dtype: object"
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "users.occupation\n",
    "\n",
    "#or\n",
    "\n",
    "users['occupation']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 12. How many different occupations are in this dataset?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "21"
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "users.occupation.nunique()\n",
    "#or by using value_counts() which returns the count of unique elements\n",
    "#users.occupation.value_counts().count()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 13. What is the most frequent occupation?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'student'"
      ]
     },
     "execution_count": 50,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#Because \"most\" is asked\n",
    "users.occupation.value_counts().head(1).index[0]\n",
    "\n",
    "#or\n",
    "#to have the top 5\n",
    "\n",
    "# users.occupation.value_counts().head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 14. Summarize the DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>943.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>34.051962</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>12.192740</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>7.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>25.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>31.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>43.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>73.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              age\n",
       "count  943.000000\n",
       "mean    34.051962\n",
       "std     12.192740\n",
       "min      7.000000\n",
       "25%     25.000000\n",
       "50%     31.000000\n",
       "75%     43.000000\n",
       "max     73.000000"
      ]
     },
     "execution_count": 51,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "users.describe() #Notice: by default, only the numeric columns are returned. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 15. Summarize all the columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>gender</th>\n",
       "      <th>occupation</th>\n",
       "      <th>zip_code</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>943.000000</td>\n",
       "      <td>943</td>\n",
       "      <td>943</td>\n",
       "      <td>943</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>unique</th>\n",
       "      <td>NaN</td>\n",
       "      <td>2</td>\n",
       "      <td>21</td>\n",
       "      <td>795</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>top</th>\n",
       "      <td>NaN</td>\n",
       "      <td>M</td>\n",
       "      <td>student</td>\n",
       "      <td>55414</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>freq</th>\n",
       "      <td>NaN</td>\n",
       "      <td>670</td>\n",
       "      <td>196</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>34.051962</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>12.192740</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>7.000000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>25.000000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>31.000000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>43.000000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>73.000000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               age gender occupation zip_code\n",
       "count   943.000000    943        943      943\n",
       "unique         NaN      2         21      795\n",
       "top            NaN      M    student    55414\n",
       "freq           NaN    670        196        9\n",
       "mean     34.051962    NaN        NaN      NaN\n",
       "std      12.192740    NaN        NaN      NaN\n",
       "min       7.000000    NaN        NaN      NaN\n",
       "25%      25.000000    NaN        NaN      NaN\n",
       "50%      31.000000    NaN        NaN      NaN\n",
       "75%      43.000000    NaN        NaN      NaN\n",
       "max      73.000000    NaN        NaN      NaN"
      ]
     },
     "execution_count": 52,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "users.describe(include = \"all\") #Notice: By default, only the numeric columns are returned."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 16. Summarize only the occupation column"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "count         943\n",
       "unique         21\n",
       "top       student\n",
       "freq          196\n",
       "Name: occupation, dtype: object"
      ]
     },
     "execution_count": 53,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "users.occupation.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 17. What is the mean age of users?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "34"
      ]
     },
     "execution_count": 54,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "round(users.age.mean())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 18. What is the age with least occurrence?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "11    1\n",
       "10    1\n",
       "73    1\n",
       "66    1\n",
       "7     1\n",
       "Name: age, dtype: int64"
      ]
     },
     "execution_count": 57,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "users.age.value_counts().tail() #7, 10, 11, 66 and 73 years -> only 1 occurrence"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
